Song Y, Kim T, Nowozin S, et al. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples[J]. arXiv preprint arXiv:1710.10766, 2017.
1. Overview
1.1. Motivation
- adversarial examples mainly lie in the low probability regions of the training distribution
In this paper, it proposed PixelDefend methods
- using statistical hypothesis testing, find modern neural density models are good at detecting imperceptible perturbations
- 63% to 84% for Fashion MNIST
- 32% to 70% for CIFAR-10
1.2. Contribution
- show that generative models can be used for detecting adversarially perturbed images and observe that most adversarial exmaples lie in low probability regions
- introduce a novel family of defend methods. PixelDefend (one of this family)
- CIFAR-10 performance
1.3. Attack Methods
- Random Perturbation
- BIM (Basic Iterative Methods)
- DeepFool
- CW
1.4. Defense Methods
1.4.1. Change Network & training procedure
- Adversarial Training.
- FGSM adversarial examples (most commonly used)
- train with BIM has witness success in small datases, but has reported failure in larger ones
- Label Smoothing (defensive distillation).
- convert one-hot labels to soft targets
- correct class 1-ε; wrong class ε/(N-1)
1.4.2. Modify Adversarial Examples
- Feature Squeezing.
- reduces the color range from [0, 255] to a smaller value, then smooths the image with a median filter
1.5. Datasets
- Fashion MNIST
- CIFAR-10
1.6. Model
- ResNet
1.7. Detecting Adversarial Examples
- bit per dimension

the distribution of log-likelihood
p-values (compute by PixelCNN)
1.8. PixelDefend

- trade-off. choose ε_defend overestimate ε_attack
1.9. Adaptive PixelDefend
- ε_defend = 0. input image probability is below a threshold value
- otherwise ε=manually chosen setting
1.10. Defensive
- Attack with BIM. unrooling the PixelCNN is too deep, lead to vanishing gradient. Moreover, time consuming to attack (10 hours to generate 100 attacking images with one TITAN Xp GPU)
- optimization problem was not amenable to gradient descent
- PixelCNN and Classifier are trained separately and have independent parameters
1.11. Experiments
p-values after defend